Goto

Collaborating Authors

 second-order method







dae3312c4c6c7000a37ecfb7b0aeb0e4-Paper.pdf

Neural Information Processing Systems

Based on the so-calledtensor normal(TN) distribution [31],wepropose andanalyze abrandnewapproximate natural gradient method, Tensor Normal Training(TNT), which likeShampoo, only requires knowledge of the shape of the training parameters.



MKOR: Momentum-Enabled Kronecker-Factor-Based Optimizer Using Rank-1 Updates

Neural Information Processing Systems

This work proposes a Momentum-Enabled Kronecker-Factor-Based Optimizer Using Rank-1 Updates, called MKOR, that improves the training time and convergence properties of deep neural networks (DNNs). Second-order techniques, while enjoying higher convergence rates vs first-order counterparts, have cubic complexity with respect to either the model size and/or the training batch size.


MKOR: Momentum-Enabled Kronecker-Factor-Based Optimizer Using Rank-1 Updates

Neural Information Processing Systems

This work proposes a Momentum-Enabled Kronecker-Factor-Based Optimizer Using Rank-1 Updates, called MKOR, that improves the training time and convergence properties of deep neural networks (DNNs). Second-order techniques, while enjoying higher convergence rates vs first-order counterparts, have cubic complexity with respect to either the model size and/or the training batch size.